Data-Oriented Language Processing. An Overview

نویسندگان

  • Rens Bod
  • Remko Scha
چکیده

Data-oriented models of language processing embody the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract grammar rules. Such models therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. This paper motivates the idea of data-oriented language processing by considering the problem of syntactic disambiguation. One relatively simple parsing/disambiguation model that implements this idea is described in some detail. This model assumes a corpus of utterances annotated with labelled phrase-structure trees, and parses new input by combining subtrees from the corpus; it selects the most probable parse of an input utterance by considering the sum of the probabilities of all its derivations. The paper discusses some experiments carried out with this model. Finally, it reviews some other models that instantiate the data-oriented processing approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting subtrees from the corpus or employ different disambiguation strategies; other models use richer formalisms for their corpus annotations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Context-Based Integrative Educational Technique in Profession-Oriented Foreign Language Teaching (Academic Model United Nations)

The aim of the article is to examine the Academic Model United Nations (Model UN) as a context-based integrative educational technique in profession-oriented foreign language teaching (FLT); to point out the context-based integrative nature of profession-oriented language learning and highlight the importance of using product-based educational techniques in FLT for developing students’ future p...

متن کامل

Declarative Semantics in Object-Oriented Software Development - A Taxonomy and Survey

One of the modern paradigms to develop an application is object oriented analysis and design. In this paradigm, there are several objects and each object plays some specific roles in applications. In an application, we must distinguish between procedural semantics and declarative semantics for their implementation in a specific programming language. For the procedural semantics, we can write a ...

متن کامل

Organizational Patterns of English Language Teachers’ Repair Practices

Despite the abundance of research on teachers’ repair practices in language classroom interaction, there are not enough conversation analytic studies on repair organization with the focus on the details of interaction in the context of EFL. Drawing on sociocultural and situated learning theories, this study explores the contingent nature of English language teachers’ org...

متن کامل

Towards constructing an Integrative, Multi-Level Model for Cognition: The Function of Semantic Networks

Integrated approaches try to connect different constructs in different theories and reinterpret them using a common conceptual framework. In this research, using the concept of processing levels, an integrated, three-level model of the cognitive systems has been proposed and evaluated. Processing levels are divided into three categories of Feature-Oriented, Semantic and Conceptual Level based o...

متن کامل

An Overview of Nonlinear Spectral Unmixing Methods in the Processing of Hyperspectral Data

The hyperspectral imagery provides images in hundreds of spectral bands within different wavelength regions. This technology has increasingly applied in different fields of earth sciences, such as minerals exploration, environmental monitoring, agriculture, urban science, and planetary remote sensing. However, despite the ability of these data to detect surface features, the measured spectrum i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9611003  شماره 

صفحات  -

تاریخ انتشار 1996